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REVISED STEROID SEARCH SYSTEM CODING MANUAL 


This manual supersedes Patent Office 
Research and Development Report No. 7, A 
Punched Card System for Searching Steroid 
Compounds and Report No. 11, A Manual for 
Coding Steroids. The present manual reflects 
the results of two years of experience by the 
Patent Office and the Industry in the use of a 
punched card system for conducting searches 
in the steroid art. 


SCOPE OF THE ART 
IN THE SYSTEM 


The system is limited to the steroid art. 
The patents included in the system are those 
which are classified in Class 260, sub-classes 
239.5, 239.55, 239.57, 397, 397.1, 397.2, 397.35, 
397.3, 397.4, 397.45, 397.47, and 397.5, and 
includes some _ steroid containing patents 
gleaned from other sub-classes. Only those 
steroids disclosed in the patents which meet 
the definitions of these class and sub-classes 
of the Patent Office classification are included 
in the system. 


Every compound which is coded must con- 
tain the steroid nucleus shown in Figure 1 in 
which the positional locations of nuclear sub- 
stitution are identified by numbers assigned to 
each position. The seco and homo steroids 


21C C26 
| 
20 C—C—C—C—C2s 
22 


23 «(24 N 


C27 


Figure 1 


are excluded. Also excluded are steroids 
classified elsewhere according to the Rule of 
Superiority in classification. 


Positional locations 18 through 23 are vari- 
ables in the sense that they may not be present 
ina particular compound (e.g., androstanes). 
The locations exist only whenthey are occupied 
by carbon atoms. 


RELATIONSHIP OF CODES AND 
PUNCH-CARD FORMAT 


Every significant term or substituent of 
the chemical steroid formula is called a 
Descriptor. 

A descriptor is defined by a single set of 
numbers designating a particular column and 
row of the standard 80-column IBM punched 
card. Conversely, each column-row location 
on the IBM is a fixed allocated space for a 
particular descriptor. The card is divided into 
two fields, columns 1 through 50 constituting 
the field for 2A terms and columns 51-65 
constituting the 2B terms. The more specific 
columns and rows for each descriptor can be 
seen fromthe steroid code sheet which contains 
the column and row number for each descriptor. 

The definition of each of the descriptors on 
the steroid code sheet follow infra. 


REMARKS ABOUT CODING 


Broadly speaking, the arrangement of de- 
scriptors on the card is according to three 
general types of information, namely: 

(1) The'"2-A" field (columns 1-50) in which 
24 structural concepts plus miscellaneous may 
be recorded as being present in any of posi- 
tions 1-23, and "Ex."" (The "Ex" designation 
is an abbreviation for "Exo" and is used to 
code substituents which are not directly con- 
nected tothe 1-23+ carbonatoms of the steroid 
nucleus. See Appendix A.) 

(2) The "2-B" field (columns 51-65) which 
is used primarily for recording structural 
concepts but without defining their position on 
the nucleus. 

For purposes of coding patents all of the 
steroid formulae disclosed within a patent are 
combined to produce a single synthetic formula 
(this process is called "compositing"). Each 
functional group appearing in this synthetic 
formula together with its position is punched 
into the 2-A and 2-B fields of the code format. 

(3) Document identification and a card 
serial number. 

Columns 66-69 are used torecord a number 
which is a chronologically assigned serial 
number to the cards in the patent deck. This 


‘number will parallel the patent numbers, except 


for reissue patents. Subscribers to the patent 
deck can use this number to check their decks 
for completeness. 

Column 70 is used to record the patent 
office classification of patents. See Appendix 
C for the code used and the type of steroids in 
each category. The coding in this column may 
provide a useful search tool, but it should be 
borne in mind that the classification assigned 
by the Patent Office is based upon claimed 
subject matter, rather than subject matter 
disclosed. 

The Column 71-80 field is used to record 
the literature reference or patent number. 
See Appendix D for the coding used in these 
columns. 

It is important to note that where chemistry 
and information retrieval are at variance, for 
the purpose of this manual, information re- 
trieval takes precedence. 


INSTRUCTIONS FOR CODING 


1. Read the document for comprehension of 
the subject matter. 

2. Encircle all of the codes representing 
both the 2A and 2B terms found in the patent 
on the coding data sheet in accordance with 
the principles of multiple coding and composite 
coding. 

3. Extract all pertinent terms disclosed in 
the patent. The title, text, and claims of the 
patent are all parts of the disclosure for 
extraction and coding purposes. Chemical 
configurations disclosed as possible substitu- 
ents as well as those more specifically dis- 
closed are extracted and coded. 

4. Have the coded information verified by 
a second individual. Note that the coded data 
sheet represents a composite of all the sub- 
stituents disclosed for the steroid nucleus in 
the given patent. 


DEFINITION OF CODE TERMS 
1. 2-A Terms 


= This symbol designates a carbon-to- 
carbon double bond involving positions 1-23. 
In coding the positional location of a double 
bond, the lower number of the pair of position 
numbers is recorded. Thus for A*:5, position 
4is coded, and for 17(20), code 17. The 2-A 
field punches will not be definitive for positions 
1,5, 8, 9, 13 and 20; distinction between the 
two variants in each of these positions is 
provided in the 2-B field. (Col. 51) The un- 
saturated linkage of methylene groups con- 
nected tothe steroid nucleus is coded as "'exo" 
double bond only. Position 10 in the 2-A field 
is not used. 


An aromatic A ring is coded 1, 3, 5 inthe 
2-A field, and in addition is coded 1(2), 5(10) 
and aromatic A in the 2-B field. 

Ring saturation is also indicated in the 2-B 
terms. Double bonds that are common to two 
rings are treated as follows: 


5(10) - A ring 
8(9) - C ring 
13(14) - C ring 


2. -H 


This symbolis appliedonly whenthe hydro- 
genatom isattached to the 10,13, or 20 carbon 
atom to indicate the absence of a methyl group 
in the 18, 19, or 21 position. For example, 
when there is a 17 formyl group, -H is coded 
at the 20 position. it is not used for a 1T- 
methyl group. 

NOTE: The hydrogen actually has to be 
present to use this descriptor. Thus, in the 
case of steroids containing an aromatic Aring 
which has neither a methyl nor a hydrogen 
attached to position 10, the 10-H code is not 
used. 


3. @ or allo 


This term represents a particular orienta- 
tion at an asymmetric steroidal carbon. 

The a descriptor is implied and coded 
wherever the configuration is known from the 
trivial name of a steroid, e.g., "cholic acid" is 
coded 3a 7a 12a and "cortisone" receives a 
17a code. 

When a steroidal carbonis substituted such 
that one of the groups must be a oriented (no 
hydrogens present on the carbon) the @ de- 
scriptor is automatically coded, e.g., a 20- 
cyanohydrin pregnane or any 17-disubstituted 
steroid. 

The 9a and 14a codes are assumed and not 
coded unless the substituent is other than H. 

Note that i compounds have a specific 2-B 
punch and a or allo is not used to identify i 
compounds. The i compounds are located in 
2-B under "General." 

However, a word of caution in using the a 
descriptor. Since, in the absence of a specific 
statement in the document, the coder may not 
have known whether a particular substituent 
was @ or not, the descriptor may have been 
omitted in some cases. It should not be used 
when a complete search is desired. 


4. B (beta) 


This code is only used when the coca : 
specifically states the 8 configuration. I 


never implied and therefore may not be relied 
on for complete searches. 

The 5-B code is not used unless the sub- 
stituent is other than hydrogen. 

NOTE: Since coding the a and Sdescriptors 
is sometimes arbitrary, they should not be 
used if a complete search is desired. 


5. Miscellaneous 


This designation is used for any group not 
provided for by any other 2-A term. 

Examples of groups in this category are 
-O-Na, azides (consider | as a unit), and 

-C-N;3 
Grignard intermediates. 

Rule of Superiority. The Misc. code isused 
only if no other designation is applicable. 
(Note: In cases of doubt between C-sub and 
Misc., the C-sub code takes precedence.) 


6. -OH 

This represents the hydroxy group. 
7. =O 

This represents the keto group. 
8. -O-Acyl 


‘This designation refers to an ester group 
attached tothe steroid throughthe -O- atom of 
the group. It is further defined in the 2-B 
terms as follows: 

Carboxylic - A carboxylic acid radical. 

Poly - A polycarboxylic acid radical (exo- 
COOR is not coded when poly is recorded). 

Unsat - An unsaturated acid radical. The 
double bond is not coded exo in 2-A. Aromatic 


‘unsaturation is excluded (e.g., the benzoyl 


group is not coded as an unsaturated acid 


- radical). 


Subst - A hydrocarbon carboxylic acid 


»-radical having a non hydrocarbon substituent. 
-.This term is applicable to poly, unsaturated, 
.. aromatic, andaliphatic radicals. The carboxyl 
= group of a polycarboxylic acyl is excludedfrom 
«this category. Examples are: 


Figure 2 


coded as carboxylic, poly, subst, halo cont., 
exo" halo, and "Cl" in 2 B. (53-0, 1, 3; 54-8; 
25-Ex; 56-8) 


Figure 3 


coded as carboxylic, aromatic, subst, O-cont., 
and "'exo'' OH. (53-0, 3, 4; 54-7; 11-Ex) 


Aromatic - A carboxylic acid radical con- 
taining an aromatic hydrocarbon ring. 

Aliphatic - An aliphatic, non-aromatic car- 
boxylic acid radical. 

St. Chain - A straight chain aliphatic acid 
radical with no non-hydrocarbon substituents 
(formate and acetate included). 

Cycloalkyl - An aliphatic hydrocarbon car- 
boxylic acid containing a cycloalkyl group (e.g., 
cyclopentylpropionate). 

Branched - Analiphatic branched hydrocar- 
bon carboxylic acid radical. 

Heterocyclic - A heterocyclic containing 
carboxylic or inorganic acid radical (e.g., 
nicotinate). 

Inorganic Acyl - Includes the acyl group of 
any inorganic acid with the exception of the 
halogen acids. Note: This term includes the 
acyl radicals derived from carbonic acid, its 
alkyl esters, and its inorganic derivatives 
(e.g., phosgene), but does not include acyl 
radicals derived from carbamic or xanthic 
acids - these are coded as substituted aliphatic 
carboxylic acid radicals. 

Phosphorus A - An inorganic phosphorous - 
containing acyl radical. 

SO, (all S) - This term includes all inor- 
ganic sulfur-containing acyl radicals. Exam- _ 
ples are the mesyl and tosyl radicals. 


Figure 4 


codedinorg, SO,, aliphatic, S-cont. and O-cont. 
(53-11; 54-0, 3, 6, 7) 


oO 
| 
H,C —S—O— 
| 
oO 
Figure 5 


coded inorg, SO,, aromatic, S-cont., and O- 
cont. (53-11, 54-0, 4, 6, 7) 


Osmium - An inorganic osmium-containing 
acyl radical. 

The 4(5) osmate of progesterone exemplifies 
this term: 


(eo) 


Nee 


Os 

// \ 

Oo oO 
Figure 6 


The osmate radical is coded as 4, 5-0-acyl in 
2-A andas inorganic, osmium, and O-containing 
in 2-B. (15-4, 5; 53-11; 54-1, 7) 


Boron - An inorganic boron-containing acyl 
radical. 

The 16, 17-cycloborate of 16a hydroxyhy- 
drocortisone exemplifies this term. 


CH,OH 
C=O OH 
/ 
coer 
HO ----O 
WY 
Me 
Figure 7 


The cycloborate radical is coded as exo-OH 
and 16, 17 O-acyl in 2-A and as inorganic, 
boron, and O-containing in 2-B. (11-Ex; 16-16, 
17; 53-11; 54-2, 7) 


Aliphatic - Anacyclic inorganic acyl radical. 

Aromatic - An inorganic acyl containing an 
aromatic hydrocarbon ring. 

N,S,0O,and Halo-containing are coded when 
an -O-Acyl radical, either organic or inorganic, 
contains any one of these elements, excluding 
the oxygen contained in the carboxyl groups of 


organic acyl radicals. The O-containing de- 
scriptor is coded in the case of SO,, PO, and 
osmates. It is not coded for polycarboxylic 
acids. 

Miscellaneous - Any O-Acyl not provided 
for above. 

Note: As a result of the multiple coding 
principle, all applicable terms are applied to 
a particular compound. 


9. -O-R 


This defines the ether linkage, i.e., R-O-R, 
where one R is a carbon which is a part of the 
steroid nucleus (except for ''Exo") and the other 
R isa substituent (aliphatic, aromatic, cyclo- 
aliphatic, or heterocyclic) which is attached to 
the oxygenatom independently through a carbon 
atom and which is further defined in 2-B by 
the following terms: 


R = hydrocarbon, e.g., ono ke 

R = N group, e.g., NH,-CH,-O ale 
R =S group, e.-g., ag CH,-O BE 
R = O group, e.g., HO-CH,-O SIC 

R = halo group, e.g., F,-C-O BOL 


R = other 


Note: Because they are specifically pro- 
vided for elsewhere acetals and ketals are not 
coded under O-R. 


10. Epoxy 


This description refers to an epoxy 8TOuP 
attached to two nuclear carbonatoms. The two 
positions to which the epoxy is attached are 
recorded. 

Since this term is closely related to the os 
hetero category the following rules are appue® 
I All carbons linked by a single oxyge? atone 

are coded as epoxy unless the oxyg®" 5k 

member of a lactone group in which cas° 

"epoxy" is not used. E.g.: 


(1) (2) hee 


Figure 8 


are coded (1) 11, 18 epoxy and (2) 17, 20 
epoxy. 


Il Epoxides of non-adjacent carbons involving 
one or two of the 20, 21, 22 and/or 23 + 
carbons are also coded as O-hetero. See 
the O-hetero section for the proper coding 
procedure in such cases. E.g.: 


H, CH, 
Cc 
—~o ox CH 
(1) CH (2) 
Figure 9 


are coded (1) 17, 21 epoxy; 17 - O-hetero, 
spiro, misc and (2) 18, 20 epoxy; 13, 17 - O- 
hetero-fused-furan. 


11. Ketal: 


This designation refers to the reaction 
product of a keto or aldehyde group with an 
alcohol to give either a cyclic or non-cyclic 
ketal or acetal. 


_The term ketal includes thioketal, semi- 
thioketal, acetal, thioacetaland hemithioacetal. 


Non cyclic ketals (acetals) should also be 
coded under the 2-B term "'Bis-substituents." 


Cyclic oxygen ketals are not coded in the 
O-hetero category in 2-A or in 2-B. Cyclic 
thioketals, including hemithioketals are not 
coded under S-hetero in 2-A but are coded as 
thioketal in the 2-B S-hetero column. 

Non cyclic ketals (acetals) are not coded 
under O-R or S-R in 2-A or 2-B. 


12. O-Hetero 


This designation refers to an oxygen- 
containing heterocyclic group. The heterocycle 
can be attached to the steroid nucleus through 
any atom of the heterocycle. 

The. heterocycle may be fused, independent, 
or spiro. When it is fused to the nucleus, the 
two nuclear positions (which need not be 
adjacent) to which it is fused are recorded. 
Whenit is spiro or independent the one position 
through which it is attached is recorded. 


Spiro 


Figure 10 


Code: 17-O-Hetero; pyranyl; spiro 
Fused 


Figure 11 


Code: 1,2-O-hetero; furan; fused 


SiH, 
Independent 


Figure 12 


Code: 16-O-hetero; independent; misc. 


Acid and anhydride adducts (e.g., maleic 
anhydride adducts), epoxides, cyclic oxygen 
containing ketals and steroidal sapogenins are 
specifically excluded from the 2-A term O- 
hetero. 

When the 18-23+ side chain carbons are 
members of the hetero ring, the point of 


attachment to the steroid ring is recorded 
unless the hetero ring is independently con- 
nected to a Side chain carbon. E.g.: 


H,C———O 


HC—CH,—C=O 


Figure 13 


Code: 17-O-hetero; independent; lactone; furan 


CH, 


| 
O~cH 


CH, 


Figure 14 


Code: 13, 17-O-hetero; fused; furan 
(Note: Also code 18, 20-epoxy) 


o—c=0 


Figure 15 
Code: 11, 13-O-hetero; fused; lactone; furan 


The O-hetero is further defined in 2-A as 
follows: 

Morpholine - thisisalso coded as N-hetero 

Furan - includes saturated and unsaturated 
forms and also 5-member lactones 


10 


Lactone - E.g.: 


iis 
Cc oO 
C=O 
| 
Figure 16 


is coded 16, 17 O-hetero, furan, lactone, fused. 


Spirostane - includes both normal and iso. 
E.g.: 


(diosgenin) 
HO 


Figure 17 


The above formula is codedas "'spirostane" - 
(no other O-hetero descriptors in 2-A or 2-B 
are used) 


Sub in O-Spivo Ring - spirostanes substituted 
in the oxygen rings by a non-hydrocarbon sub- 
stituent. The substituent is also coded as 
"exo." 


NOTE: The 2-A designations 16, 17-0- 
hetero and 2-B fused are not recorded for 
sapogenins or for pseudo-sapogenins. 


Pseudosapo - the pseudosapogenins are 
derived from steroidal sapogenins by treatment 
with acid anhydrides. The free hydroxy analog 
is included in this descriptor. The side chain 
hydroxy or acetate is not coded "exo." The 
double bond in the fused O-hetero ring is not 
coded as "exo" - it is included in the te™™ 
"pseudosapogenin." E.g.: 


weturt’ 


pe OF (pseudodiosgenin acetate) 


Figure 18 


is coded as '"pseudosapogenin." 


Acetonide - cyclic acetal or ketal which is 
the reaction product of an aldehyde or ketone 
with 2 hydroxy groups attached to the steroid 
nucleus. It is also coded as "fused."" E.g.: 


The acetonide portion of 


Figure 19 


(16a, 17a isopropylidenedioxyprogesterone) 


is coded 16, 17 O-hetero, acetonide, fused. 

NOTE: The methyl groups may be replaced 
by hydrogen or any other substituent (hydro- 
carbon or substituted hydrocarbon, hetero- 


cyclic, halogen, etc.) or by substituents which 


themselves form a ring. 

Peroxide - The C-O-O-C linkage. This is 
limited to peroxide in ring configuration only. 
The -C-O-O-H group is coded as "Misc." in 
2-A. The peroxide descriptor does not include 
"ozonide" which is coded as O-hetero-fused- 
misc. E.g.: the peroxide group of 


(ergosterol peroxide) 
Figure 20 
is coded 5, 8 O-hetero, peroxide, fused. 
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Pyranyl - includes saturated and unsaturated 
forms 

Spiro, Fused and Independent - these terms 
refer to the manner in which the heterocycle 
ring is attached to a carbon atom or atoms of 
the steroid nucleus. 

Miscellaneous - O-hetero substituents not 
specifically provided for above but specifically 
excluding epoxides, ketals, and acid and 
anhydride adducts. 


13. Hal - 


Halogen is coded wherever it appears ex- 
cept in acid halides (see COOR). It is further 
defined in 2-B (F, Cl, Br, J). 


14. -S(Se, Te)-R 


Any substituent joined to the steroid nucleus 
by attachment through a sulfur, selenium, or 
tellurium atom. S - containing heterocyclics 
are excluded. 

It is further defined in 2-B as follows: 

-Se-R The substituent is joined to the 
steroid nucleus through a selenium atom. 

-Te-R The substituent is joined to the 
steroid nucleus through a tellurium atom. 

=S(Se,Te) =S, =Se, or =Te is the substitu- 
ent, and is attached by a double bond. 

R=H Used for -S-H group. 

R=other This includes all -S-R groups 
where R is not H, including -SO,H, -SO,H, etc. 

Inusing the exo 2-A term =S(Se,Te) is spe- 
cifically excluded where the =S(Se,Te) is part 
of an acyl group. Also specifically excluded 
are NCS and SCN which are provided for else- 
where. 


15. S-het Ring 


This designation refers to any sulfur- 
containing heterocyclic group attached to any 
of the 1 to 23+positions of the steroid nucleus 
through any of the atoms of the heterocycle. 

The same general rules for coding rings are 
followed that were recited under O-hetero. 

This category is further defined in 2B as 
follows: 

Thiophene - the S-hetero in thiophene con- 
figuration, saturated or unsaturated. 

Thiazole - the S-hetero in thiazole con- 
figuration, saturated or unsaturated. (Also 
coded as N-hetero.) 

Thioketal - includes both cyclic mono- 
thioketals and cyclic dithioketals. The 2-A 
S-hetero descriptor is not used for "'thioketal." 
The appropriate position is codedunder "ketal." 

Spiro, Fused, and Independent have been 
defined above. (See O-hetero.) 


STEROD DATA 3 


HETERO BLEMENT OR HETERO RIBG 


Non Substituent 


aie a rx 2a|/m&x al|E =| rx 2| & Ex Ex 
li | 234 20 | 2 23+ 23+ 23+ 20 | 23+ 20 | 23 20| 23+ 20] 23% 23+ 2H 
a} 2 22 22 2 22 10 22 10 2 10 R22 10 22 22 22 
sh /| ah pakle/ IBY 1 1 7, SBR I alee bbe ayes Sia (a irerb Ey cal 1 1 
2 bY wrt} 2 2 2 2 2 2 12 2 12 2 12 2 2 2 
3] 3 13] 3 3 cy ake) Selseleseuasiio= 19 |. 3° 23) 3 3 3 
Re We 1g 4 fy Wh ETL eh Le Vekuweth || 6 1h) 3% 4 4 
5} 5 15] 5 5 5. 15 Resale sa 8) S|) 5 25) 5 5 5 
6] 6 161] 6 6 6 16 6 146|] 6 16| 6 16] 6 16] 6 6 6 
mele 27. |. 7, 7 7) 7, op Map I GP ath AD eras ty Pt Ty Te Uy Dy 7 7 
8] 8 8 8 8 18 8 is |s i18|s i18| 8 18] 8 8 8 
9} 9 9 9 9 19 9 19|/9 19 | 9 19] 9 19 19 | 9 19/9 9 
yay eyes 9 Tew 54/5 9 16 | 17 18 E ala 2l3 w 
| | ime 
1 = 8 —O-Acyl 9 ~O-R 4 -8(SeTe)-R 17 53 
Oo 1(2) QO Carboxylic QO Rshydrocarbon O -Se-R 0 & 
1 1(10) 1 1 R=N group 1 -Te-R 1k 
2 5(6) 2 Unsat. 2 RS group 2 =8(SeTe) 2 3 
3  5(10) 3 Sabst. 3  R=O group 3 Re 3 3 
4 8(9) 4 Aromatic 4 Real group 4 ReOther 4 8 
5 8(14) 5 Aliphatic 5 R=Other nee SS 5 
6 9(10) 6 St. chain 15 S$ Het Ring 6 4 
7 9(11) 7 Cyclo-al 12 O Hetero ring 13 
8 13(14) 8 Branched 5 Thiophene 8 $s 
9 13(17) 9 Heterocyclic 6 Morpholine 6  Thiazole 9°45 
n wa) 11 =sImorganic acyl 7 Foran 7 Thickstal n 3 
51.12 22) 5312 Phosphorus A 8 Lactone 6 Spiro 12 E 
52. 0 Aromatic A 54 0 So, (all 8) 9  Spirostane 9 Fased 2. —— 
1 Sat.ring A 1 Osmium 1 Sub in 0 spiro 11 Independent 18 bes 
2 Sat. ring B 2 Boron 12 Pseudosapo. 12 Mise. =e 
3 Sat. ring C 3 Aliphatic % 0 Acetonide iL 9 8 
4 Sat. ring D 4 Aromatic 1 Peroxide 58 16 N-R-R 13 
5 N containing 2 Pyranyl : 2 
2 Unsubstituted 6 8 containing 3 Spiro O Primary amine 33 
7 O containing 4 Fused 1 Secondary amine ——4 
5 Ring A 8 Halo containing 5 Independent 2 Tertiary amine as! 
6 Ring B 9 Misc. 6 Misc.(N.E.K.) 3 Quaternary amine 4 
7 ‘Ring C 4  NOo, NO & Misc. 4 F 
8 Ring D 13 Hal 5  Reacyl ; F 
7 °F Gree 73 
8 cl 8 f 
9 Br 9 + 
mui, Fe 
; 
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C OR C-RING ATTACHED TO NUCLEUS 
Hydrocarbon Substituent suatero ktommavtsn Gare 


Ww 
iS) 


22 HC Ring 

Oo 36M 

eM 

205M 

3 6M 

4 74M 

5 Sat 

6 Unsat (N.Arom) 

7 Aromatic 

8 Spiro 

9 Fused 
61 11 = Independent 
62 24 coor 

O RH 

1 R=salts (Met,Am) 

2 Realkyl 

3. om X (Xehetero) 

4 

4 of 0 frcheter) 

25 -C-R 

5  R=N group 

6 R=8 group 

7 R=0 group 

8 R=Halogen group 

9 R=Other 

11 Oralyl 


o> 
DE 
AUEWNHKOlYO HPO RBIRNAWHHO 


QOr| ze 


Ww 
wu 


3-OR 
Bile acids (3-5) 
Bile (NeA.) (3-5) 
Vitamin D 
Isopregnane ~ C-C 


Bis Subst. (Same) 


At C(17) 
At same C(not 17) 


B 


WORAMMSWNEO! EK 


6 


i 


70 P.O. 


CorowauneBoe 


S&S 


BEEGKESSN 


General 


Addition 

Maleic adduct 
CNO, NCO, NCS, SCN 
21 Diaso 
Radioactive 

i compounds 
Microbiological 
Oxidation 


Ext. of natural materials 
Classification 


66 67.68 69.70 71 72 73 74 75 76 77 78 79 80 


SRSEGEGHESRN 


Miscellaneous - S-hetero substituents not 
specifically provided for above. The thiirane 
(-C-C-) 
ring \/ _ is coded here. 
s 


16. N-R-R 


This defines a nitrogen containing group 
connected through its nitrogen atom to the 
steroid nucleus. The term is not applicable 
when the nitrogen atom is part of a hetero- 
cyclic ring. 

The 2-B descriptors for N-R-R are: 

Primary, Secondary, Tertiary and Quater- 
nary Amines are self-explanatory. E.g.: 


Figure 21 


is coded 3- NRR in 2-A and 
amine" in 2-B. 


NO,, NO and Misc. - This term includes 
NO,, nitroso and all NRR groups not covered 
by the other definitions in 2-B (e.g., nitrone, 
nitrate). 

R = Acyl - converts the amine group to 
amides. 

Imino - defines an imino group. 

NOTE: =N - Ketone reagents are specifi- 
cally excluded from N-R-R. 


"secondary 


17. N-hetero Ring 


This designation refers to a nitrogen- 
containing heterocyclic group attached through 
any of the atoms of the heterocyclic group. 

The heterocycle can be fused, independent, 
or spiro. The rules set forth under O-hetero 
are followed. 

The 2-B terms further define as follows: 

Morpholine - (also coded as O-hetero) 

Piperidine - 

Pyridine - (including the dihydro and tetra- 
hydro forms) 

Pyrimidine - (including all saturated forms) 

Azole - A 5-membered saturated or un- 
saturated ring containing at least one nitrogen. 
This isa generic term and all azolesare coded 
here. 
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Pyrrolidine - (Pyrrole) (including all satu- 
rated and unsaturated forms) - NOTE: Azole 
is alsocoded whenever this descriptor is used. 

Thiazole - (including the saturated and un- 
saturated forms) -NOTE: Azole is also coded 
whenever this descriptor is used. 

Piperazine - 

Spiro, Fused and Independent have been 
defined above (see O-hetero) 

Miscellaneous - N-hetero substituents not 
specifically provided for by the terms above. 


18. Keto Reagent 


This designation refers toa reaction product 
of a ketone or aldehyde group attached to any 
of the positions 1 to 23+ with well-known keto 
reagents. 

It is further defined in 2-B as follows: 

Hydrazone, Oxime and Semi-Carbazone are 
self-explanatory. Thiosemicarbazone is coded 
as semicarbazone and exo "'S.'"' When uncom- 
mon exo groups are present they are recorded 
(e.g., in dinitrophenyl hydrazone the nitro 
groups would not be coded; but in di-cyano 
phenylhydrazone the cyano groups would be 
coded). The above should be considered when 
attempting to find substituted keto reagents. 

Girard Reagent - includes all acyl hydra- 
zones substituted on the acyl moiety by a qua- 
ternary ammonium radical. Girard reagents 
are not coded as hydrazones and are not split 
into "exo" terms, except for halogen. 

NOTE: Substituents coded here are not 
coded under NRR. 


19. CH, 


This represents a methyl group. This code 
is not used at positions 10, 13, 20, 22, or 23+ 
because methyl groups in these positions are 
considered integral parts of the steroid nu- 
cleus. When a methyl group is coded, the 2A 
term "hydrocarbon chain" and the 2B term 
"lower alkyl (1-7)'"'arealso coded. The''exo" - 
CH, descriptor is not used. 

20. —C=C— 

This represents the ethinyl linkage. When 
this code is used "hydrocarbon chain" in 2-A 
and "'alkinyl" in 2-B are also coded. 


21. HC (Hydrocarbon) Chain 


This term represents a hydrocarbon chain. 
Methyl groups at the 10 and 13 positions are 
not coded (see the -CH, descriptor). Also, in 
pregnanes, the chain of 2 carbons attached to 
the 17 position is not coded as a hydrocarbon 


unless it is unsubstituted (i.e., -CH,-CH,). In 
cholesterols and other sterols, the hydrocarbon 
chainin the 17, 20 and 22 positions is not coded 
as hydrocarbon. 

Hydrocarbon chain includes aralkyl or any 
other cyclic hydrocarbon ring attached to the 
steroid nucleus through an aliphatic carbon 
chain but the carbons of the ring are not 
counted as chain members. 

This descriptor is not used at positions 20, 
22, or 23+ (except in the case of 20-, 22-, or 
23+ - ethinyl) since carbons attached to these 
positions are integral parts of the steroid 
nucleus. 

"Exo''-HC Chain is not coded except when 
used in conjunction with the "exo''-C=C- de- 
scriptor. 

The 2-B definitions are: 

L. Alkyl (1-7) a chain of 1 to 7 carbons 

Hi. Alkyl (8+) a chain of 8 or more carbons 

=CH, this includes both Hand unsubstituted 
hydrocarbon substituents attached to =C<. 

Alkenyl - if the HC chain contains a double 
bond other than at the point of attachment to 
the steroid nucleus this code is used. 

With Aromatic - an aliphatic side chain 
substituted by an aromatic hydrocarbon ring. 

Alkinyl-if the HC chain contains a triple 
bond this descriptor is used. 


22. Hydrocarbon Ring 


This term includes any hydrocarbon ring 
including those having non-hydrocarbon sub- 
stituents, attached to the nucleus in positions 
1-23+. In the case where the ring is fused to 
the nucleus the positions on the ring to which 
it is attached are recorded. The "exo''-HC 
ring descriptor is not used. 

The 2-B definitions are: 

3M, 4M, 5 M, 6 M, and 7 Mall denote the 
number of carbon atoms in the ring (M- 
members). 

Sat - used if the ring is saturated. 

Unsat (N. Arom.) - used if the ring is un- 
saturated but not for aromatic rings. 

Aromatic - 

Spiro, Fused, and Independent are described 
under O-hetero. 


23. CN 


This represents the nitrile (cyano) group. 
24. COOR 


This term represents carboxylic acid radi- 
cals and their salts, esters, and amides. It 
also includes thioesters, thioamides, and 
halides. COOR is considered as a unit and the 
-OH, =O, -NH,, etc. portions are not coded 
elsewhere. The 2-B terms are: 


ea 


R=H - the carboxyl group 

R=salts - metal and amine salts of the car- 
boxyl group 

R=alkyl - esters of the carboxyl group with 
any alcohol, including aromatic alcohols. 


oO 

I 
C - X, X=hetero - X includes any hetero 
atom, the most common of which are nitrogen 


(amides) or the halides. 


x 
I 
C - O, X=hetero- most commonly X will be 
sulfur. (NOTE: in the case of dithio acids both 
of the latter codes are recorded). 


25. -C-R (C-subs) 


This symbol represents a non-hydrocarbon 
substituent linked through a carbonatom to the 
steroid nucleus and not specifically provided 
for by any other 2-A terin. 

In cases of doubt the C-sub is superior to 
"Miscellaneous"'and if a group can be coded in 
C-sub, it is coded there rather than in "'Mis- 
cellaneous." 

Substituted cycloalkyls attached to the nu- 
cleus are coded HC-ring and not as C-sub. 
The substituting groups are coded as ‘'exo." 

The substituted carbon group is further 
defined in 2-B 

R=N containing group 

R=S w wt 

R=O iad wv 

R=halogen et uy 

R=other 


oOo 

R=oxalyl. This defines the || || group. 

=C-G= 
(21-oxalyl is also coded 4-carbons at 17; the © 
sodium enolate is coded 21- double bond and 
21-C-sub, R=O containing group.) 

NOTE: The exo C-sub descriptor is not 
used. In these cases the radical that is sub- 
stituted on the carbon chain is coded as exo. 
Also C-sub is not employed at the 10, 13, 17, 
20, 22 and 23+ positions. For example: 


CH,-NH, 
Figure 22 


Code 7-C-sub in 2-A and N-cont. group in 2-B 
(as well as "exo'' N-R-R and primary amine) 


CH,Cl 


Figure 23 


Code 21-C-sub in 2-A and "halogen group" in 
2-B. (Also code "exo" halo, chlorine, and 3 
carbons at 17.) 


The foregoing 25 categories are all 2-A 
terms, some of which were further defined in 
2-B. 


CODING OF 2-B TERMS 


The following are found only in the 2-B 
section and are not specific to any of the 2-A 
terms. 


Unsubstituted 


Ring A - no substituents at any of the posi- 
tions 1, 2, 3, 4, 5, and 10 in any one of the 
compounds of the document. (10-methy]l is not 
a substituent.) 


Ring B - positions 6 and 7 

Ring C - positions 8, 9, 11, 12, 13, and 14 
(13 methyl is not a substituent). 

Ring D - positions 15, 16, and 17 (the 17- 
side chain in pregnanes, sterols, etc., is con- 
sidered a substituent). 


Carbons at 17 


Only the largest number of carbons in a 
single group at 17 is counted. Carboxy and 
cyano carbons are included. 

The side chain may be straight or branched. 
Steroidal sapogenins are not considered as 
having a 17-side chain. No carbons that are 
members of cyclic groupings are counted here. 
E.g.: 


16 


CH, 


neon, | 


Figure 24 


is coded as 3 carbons at 17. 


NOTE: When 63-0 is coded, one of 63-1 
and 63-2 must also be coded. Also, when 63-4 
is coded, 63-3 must be coded. When any of 
63-9, 11, or 12 or 64-0 or 1 are coded there 
must be an accompanying 63-8 code. 


, 


0, 1 (Androstane) is self-explanatory 
0. °C uy tf 
I C ” ” 


2 C(Pregnane) " sf 


Note: 21 Norprogesterone is coded as a 
pregnane (also as 20-keto). 


21 Unsubst. is self-explanatory 
? Lad 


+ 


generic 


SLANDDAAYLH 


+ 
ORORORORORORS ROR) 


NN 


6+ no 3-OR onnucleus - this term represents 
sterols having no substituent at the 3- position 
or some substituent other than oxygen at said 
position. 

Bile acids (3-5) (includes bisnorcholanic, 
norcholanic and cholanic acid side chains) 

Bile (N.A.) (3-5) All compounds which have 
a chain of 3-5 carbon atoms in bile acid con- 
figuration in the 17 position but definitely 
excludes the acid. Bile (Non Acid) contains 
one of the following groups but excludes every- 
thing which has been defined under COOR 
previously: 


ee See 


Cc (e; Cc 
| | | 
Tat ._C—C—C C—C—C-—C 
Figure 25 
e.g., 
i 
C—C—OH 
is coded Bile N.A. 
Figure 26 


Vitamin D - all members of the Vitamin D 
family are included even though they are not 
steroids. 

Isopregnane a C-C is coded when thereisa 
saturated, substituted 2 carbon chain in alpha 
configuration at the 17 position. 


Bis Subst. (Same) 


This is used when a steroidal carbon is 
attached to two identical groups. It isnot used 
to record bismethyl substitutions in the side 
chain as in the case of the 26 and 27 methyl 
groups of cholesterol nor is it used for hydro- 
genatoms or for symmetrical spiro rings (e.g., 
ketals). It is further divided into two specific 
descriptors 

At C(17) 

At same C (not 17) 

Example: lanosterol is coded bis "at same 
C (not 17)" for the 4, 4 bismethyl groups. 


M General (Misc.) 


Addition - This term designates addition 
compounds suchas bisulfate addition produces, 
hydrates, Grignard addition compounds, etc. 
(Also code under 2-A Miscellaneous when 
possible.) 

Maleic adducts - Steroid reaction products 
of the type of maleic acid anhydride or ester 
adducts. Note: Maleic adducts take precedence 
Over all other codes and are only coded as 
such except for nuclear substituents. No other 
codes are used for the adduct. 

CNO, NCO, NCS, SCN - Self-explanatory 
(also code in 2-A under "Miscellaneous". 
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21 Diazo - Self-explanatory (not coded 21- 
N-R-R or 21-Misc.). 

Radioactive - Thisincludes all steroids that 
are radioactive or have radioactive isotopes 
as part of their structures. 

i compounds - Self-explanatory, not con- 
sidered as a fused compound. 

Microbiological - This describesa chemical 
process carried out by microbiological meth- 
ods. 

Oxidation, Reduction and Unsaturation are 
specific under ''Microbiological" for particular 
processes. Both generic and specific terms 
are coded. 

Ext. of natural materials is used whenever 
the document recites a natural material ex- 
traction process. 


ALTERNATIVE CODING PROCEDURE— 
NON-COMPOSITE METHOD 


A modification of the coding system de- 
scribed above can be employed when it is 
desired to provide machine selection and dis- 
crimination on an individual compound basis. 
This modificationisincontrast with the method 
of composite coding. 

In individual compound coding, the 2A sub- 
stituents and 2B terms disclosed for the par- 
ticular compound are coded inthe usual manner. 
The absence of 2A terms in the remaining 
positional locations on the steroid nucleus is 
indicated by employing the ''H"' descriptor for 
each of such locations. The only exception is 
in the 17 position, in which the punching of an 
"H' signifies the presence of only one sub- 
stituent instead of two. When the keto group 
is in positional location 17, no ''H'' is punched. 
This device is not used for positional locations 
20, 21 and 22. 

To find those compounds which do not have 
double bonds in certain positions, the absence 
of a double bond is asked for by an appropriate 
wiring modification. 
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APPENDIX A - NOTES ON THE USE OF "EXO” 


The designation "Ex" represents substitu- 
ents or pairs of double bonded carbon atoms 
which are not directly connected through 
carbon-to-carbon linkages tothe 1-23+ carbon 
atoms of the steroid nucleus. For example, 
the carboxyl substituent in cholic acid 


Ws Hy 
C. 23 

QH HC~ ‘cH, 
| o 
A 
C—OH 
24(23+) 

HO” ‘OH 

Figure 27 


is directly connected, and is coded as 23+ 
COOR. The sulfo radical in taurocholic acid 


Gs H, 
CL 23 
QH HC—™ CH, 
Lo wall 
7nN—c—c—S—OH 
24(23+)H | 
fo) 
HO’ ‘OH 
Figure 28 


and the carboxyl radical in glycholic acid 


es H, 
Cc 23 
OH HO oes 
OH, O 
ZnN—c—cHo 
24(23+)H 
HO” ‘OH 
Figure 29 


are not directly connected, and are coded as 
Ex-S-R and Ex-COOR respectively. 


Note that in each of these compounds, the 
2A term 23+ COOR and the 2B term 


62 -3-C- X X=hetero are also coded to 
represent the carboxylic acid derivative 
substituents at the 23 positions. 


The designation "Ex" is not coded in the 
following 2A columns: 
H ; B . Misc. ; 


fo ; 
CH,; HC Ring ; HC Chain* ; -C- sub . 


*But Note: alkinyl- C=C-is coded as an "Ex" 
term, and when it is, HC Chain is also 
coded. 


APPENDIX B - NOTE ON CODING AT 10, 13, 18 AND 19 


When the methyl groups at positions 10 and 
13 are substituted the substituents are coded 
from positions 18 and 19 unless these positions 
are part of aring in which case they are coded 
from positions 10 and 13. 


aie few examples will serve to illustrate 
is: 


19 


Figure 30 


is coded 19-methyl (not 10-HC chain) 


Figure 31 


is coded 19-C-sub (not 10-C-sub) 


Figure 32 


is coded 10-11-O-hetero 


APPENDIX C 


Column 70 records the Patent Office Clas- 
sification of patents coded. (See reference 3.) 


70 P.O. Classification 
Class 260, subclass: 


Column 78 


Columns 71-77 


OP WON Fr 


: code designating the coun-; 


239.5 
239.55 
239.57 
397 
397.1 


70 P.O. Classification 
Class 260, subclass: 


6 397.2 
7 397.25 
8 397.3 
9 397.4 
10 397.45 
11 397.47 


12 397.5 


APPENDIX D 


try of patent origin; see 
Table I for particular 
codes. 


: patent number. 


For published literature, the remaining ten 
columns are allocated as follows: 


Column 71 


Column 72 


Column 73 


Column 74 


Columns 75-78 


: code representing the 


journal. 


: code designating the coun- 


try of journal origin; see 
Table II for the combined 
journal—country codes. 


: year of journal; see Table 


I for particular codes. 


: month or volume of jour- 


nal, the month taking pre- 
cedence. The volumeis an 
arbitrarily assigned num- 
ber—1 for 1958, 2for 1959, 
etc. 


: page numbers of journal 


article. 
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Columns 79, 80 


: unigue arbitrarily as- 


signed number to each 
compound coded in the 
journal article. 


Table I 


* CODES DESIGNATING NATIONAL ORIGIN 
(Punch in Column 72) 


lool 


C 
D 
E 
F 
G 
H 
I 

J 

K 
L 
N 
Q 
R 
U 
Z 


Austria 
Great Britain 
Canada 
Denmark 
India 

France 
Germany 
Switzerland 
Italy 

Japan 
Australia 
China 
Netherlands 
Spain 

Russia 
United States 
Czechoslovakia 


Qa 
> 


BaAyZeZnUsawE 


Qn 
aa 


aPY 
oo 


DHOWP 


NY ZryaOW 
QNAAMDANA 


HOY 
hg oot 
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Table I 
REVISED LISTING OF CODES IDENTIFYING JOURNAL AND NATIONAL ORIGIN 


(NOTE: This revised list includes all corrections and additions to date. 
Reference to previous lists should be ignored.) 


A - AUSTRIA 


Monatshefte fur Chemie und Verwandte Teile anderer Wissenschaften 


B - GREAT BRITAIN 


Manufacturing Chemist 
Biochemical Journal 
Journal of the Chemical Society 
Chemistry and Industry 
*Journal of Pharmacy and Pharmacology 
Tetrahedron Letters (Pergamon Press Ltd.) 
British Medical Bulletin 
Nature 
Current Chemical Papers 
Proceedings of the Chemical Society of London 
Tetrahedron (The International Journal of Organic Chemistry) Pergamon Press 


C= CANADA 


Canadian Journal of Biochemistry and Physiology 
Canadian Journal of Chemistry 


D - DENMARK 
Acta Endocrinologica 
Acta Chemica Scandinavica 
E - INDIA 
Journal Indian Chem. Soc. 
F - FRANCE 


Comptes Rendus de l'acad. des Sciences 

Comptes rendus des seances de la societe de biologie et de ses filiales 
Bulletin de la societe chimique de France 

Annales d'endocrinologie 

Annales Pharmaceutiques Francaises 


G - GERMANY 


Chemische Berichte 

Angewandte Chemie 

Zeitschrift fur Naturforschung 

Annales der Chemie, Justus Liebigs 
Naturwissenschaften 

Zeitschrift fur physiologische Chemie (Hoppe- Seyler's) 
*Zeitschrift fur chemie 


H - SWITZERLAND 
*Chimia 
Helvetica Chimica Acta 
Experientia 
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Table I—REVISED LISTING OF CODES IDENTIFYING JOURNAL 
AND NATIONAL ORIGIN—Con. 


I - ITALY 
I Annali Di Chimica 
I Gazetta Chimica Italiana 
I *Il Farmaco Edizione Scientifica 
J - JAPAN 


Proceedings Japan Academy 

Journal of Biochemistry 

Journal of the Chemical Society of Japan 

Journal of the Agricultural Chemical Society of Japan 

Agricultural and Biological Chemistry (formerly Bulletin of the Agricultural Chemical Society) 
Hiroshima Journal of Medical Science 

Journal of the Pharmaceutical Society of Japan 

Yonago Acta Medica 

Chemical and Pharmaceutical Bulletin (formerly Pharmaceutical Bulletin) 

Bulletin Chemical Society 


K - AUSTRALIA 
K Australian Journal of Chemistry 


L - CHINA 


L_ Acta Chimica Sinica 
L_ Scientia Sinica-Academia Sinica 


M - BELGIUM 
M *Bulletin des Societies Chimiques Belges 


N - NETHERLANDS 
N_ Recueil des Travaux Chimiques des Pays-Bas 


Q - SPAIN 


© 


Anales de Fisica y Quimica 


R_- RUSSIA 


Dokliady Akademii Nauk Soyuza Sovetskikh Sotsialisticheskikh Respublik 
Izvestiya Akademii Nauk Soyuza Sovetskikh Sotsialisticheskikh Respublik Otdelenie 
Khimicheskikh Nauk (Classe des sciences chimiques) 
Voprosy Med. Khimii 
Zhurnal Obshchei Khimii (Journal Gen. Chem. USSR) 
*Meditsinskaya Promishlennost SSSR 
Ukrainskii Khimicheskii Zhurnal 


Ban D WW 


U - UNITED STATES 


Archives of Biochemistry and Biophysics Academic Press 
Journal of Biological Chemistry 

Journal of the American Chemical Society 

Endocrinology 

Journal of Clinical Endocrinology and Metabolism 

Journal of the American Pharmaceutical Association 

U *Chemical and Engineering News 

U Journal of Pharmaceutical Sciences 

U Proceedings of the Society for Experimental Biology and Medicine 
U Proceedings of the Federation of American Societies for Experimental Biology 
22 
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Table I—REVISED -LISTING OF CODES IDENTIFYING JOURNAL 
AND NATIONAL ORIGIN—Con. 


awnvo 
qaqadaca 


Science 


U_- UNITED STATES—Con. 


Journal of Organic Chemistry 
American Journal of Physiology 
Biochemical and Biophysical Research Communications 


Z - CZECHOSLOVAKIA 


C Z_ Collection of Czechoslovak Chemical Communication 


*These are journals which we do not regularly receive. However, whenever articles in these journals are found or called to 


Our attention, we code them. 


Table III 


CODES DESIGNATING YEAR OF JOURNAL 


(Punch in Column 73) 


Year 


1958 
1959 
1960 
1961 
1962 
1963 


Code Year Code 
11 1964 4 
12 1965 5 

(0) 1966 6 
1 1967 {i 
2 1968 8 
3 1969 9 
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